{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Input-output operations\n",
    "\n",
    "(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev\n",
    "\n",
    "Installation instructions of CGRtools package information and tutorial's files see on `https://github.com/cimm-kzn/CGRtools`\n",
    "\n",
    "NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pkg_resources\n",
    "if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:\n",
    "    print('WARNING. Tutorial was tested on 3.1 version of CGRtools')\n",
    "else:\n",
    "    print('Welcome!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load data for tutorial\n",
    "from pickle import load\n",
    "from traceback import format_exc\n",
    "\n",
    "with open('reactions.dat', 'rb') as f:\n",
    "    reactions = load(f) # list of ReactionContainer objects\n",
    "\n",
    "r1 = reactions[0] # reaction\n",
    "cgr2 = ~r1  \n",
    "cgr2.reset_query_marks()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*CGRtools.files* subpackage contains file readers and writers classes.\n",
    "\n",
    "## 7.1. MDL RDF reader\n",
    "\n",
    "**RDFread** class can be used for RDF files reading.\n",
    "Instance of this class is file-like object which support **iteration**, has a method **read()** for parsing all data and **context manager**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1.1. Read file from disk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from CGRtools.files import * # import all available readers and writers\n",
    "\n",
    "with RDFread('example.rdf') as f:\n",
    "    first = next(f)  # get first reaction using generator\n",
    "    data = f.read()  # read remaining reactions to list of ReactionContainers\n",
    "\n",
    "data = []\n",
    "with RDFread('example.rdf') as f:\n",
    "    for r in f:  # looping is supported. Useful for large files.\n",
    "        data.append(r)\n",
    "\n",
    "with RDFread('example.rdf') as f:\n",
    "    data = [r for r in f]  # list comprehensions application. Result is equivalent to f.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OOP-stype Pathlib supported"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "with RDFread(Path('example.rdf')) as r: # OOP style call\n",
    "    r = next(r)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### opened files supported\n",
    "RDF file should be opened in text mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('example.rdf') as f, RDFread(f) as r:\n",
    "    r = next(r) # OOP style application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1.2. Transparent loading from archives and network\n",
    "Readers designed transparently support any type of data sources. \n",
    "\n",
    "Page https://cimm.kpfu.ru/seafile/f/aeaca685e3854ae2bbad/?dl=1 returns RDF file.\n",
    "\n",
    "Data sources should be file-like objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from requests import get\n",
    "from io import StringIO\n",
    "\n",
    "# get function return requested URL which has attribute text. \n",
    "# in example this text is whole RDF stored in single string.\n",
    "# RDFread does not support parsing of strings, but one can emulate files with data \n",
    "# instead of strings by using io.StringIO\n",
    "with StringIO(get('https://cimm.kpfu.ru/seafile/f/aeaca685e3854ae2bbad/?dl=1').text) as f, RDFread(f) as r:\n",
    "    r = next(r)\n",
    "    print(r, 'StringIO downloaded from network data')\n",
    "\n",
    "# python support gzipped data. This example shows how to work with compressed \n",
    "# data directly without decompressing them to disk.\n",
    "from gzip import open as gzip_open\n",
    "with gzip_open('example.rdf.gz', 'rt') as f, RDFread(f) as r:\n",
    "    r = next(r)\n",
    "    print(r, 'gzipped file')\n",
    "\n",
    "# zip-files also supported out of the box \n",
    "# zipped files can be opened only in binary mode. io.TextIOWrapper can be used for transparent decoding them into text\n",
    "from zipfile import ZipFile\n",
    "from io import TextIOWrapper\n",
    "with ZipFile('example.zip') as z, z.open('example.rdf') as c:\n",
    "    with TextIOWrapper(c) as f, RDFread(f) as r:\n",
    "        r = next(r)\n",
    "        print(r, 'zip archive')\n",
    "\n",
    "# tar-file reading example\n",
    "from tarfile import open as tar_open\n",
    "from io import TextIOWrapper\n",
    "with tar_open('example.tar.gz') as t:\n",
    "    c = t.extractfile('example.rdf')\n",
    "    with TextIOWrapper(c) as f, RDFread(f) as r:\n",
    "        r = next(r)\n",
    "        print(r, 'gzipped tar archive')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2. Other Readers\n",
    "* SDFread - MOL, SDF files reader (versions v2000, v3000 are supported)\n",
    "* MRVread - ChemAxon MRV files reader (lxml parser is used)\n",
    "* SMILESread - SMILES strings files reader (coho backend used). Every row should start with new SMILES\n",
    "* INCHIread - INCHI strings files reader (INCHI trust backend used). Every row should start with new InChI\n",
    "\n",
    "All files except MRV should be opened in **text-mode**  \n",
    "MRV requires binary mode `open('/path/to/data.mrv', 'rb')`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with MRVread(open('example.mrv', 'rb')) as f:\n",
    "    mrv = next(f)\n",
    "mrv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.3. File writers\n",
    "Export in following file formats is supported:\n",
    "* RDFwrite (v2000) - molecules and reactions  export in RDF format\n",
    "* SDFwrite (v2000) - molecules and CGR export in SDF format\n",
    "* MRVwrite - molecules and reactions export in MRV format\n",
    "\n",
    "Writers has the same API as readers. All writers work with text-files\n",
    "Writers has `write` method which accepts as argument single reaction, molecule or CGR object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with RDFwrite('out.rdf') as f: # context manager supported\n",
    "    for r in data:\n",
    "        f.write(r)\n",
    "# file out.rdf will be overriden"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = RDFwrite('out.rdf') # ongoing writing into a single file\n",
    "for r in data:\n",
    "    f.write(r)\n",
    "\n",
    "f.write(r1)\n",
    "\n",
    "f.close() # close file. Flushes Python writer buffers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.4. CGR can be stored in MDL SDF and loaded from.\n",
    "\n",
    "White-paper with SDF-CGR specification is described in manusript Supporting Materials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from CGRtools.files import *\n",
    "from io import StringIO\n",
    "\n",
    "with StringIO() as f,  SDFwrite(f) as w:\n",
    "    w.write(cgr2) # file writing in SDF format\n",
    "    mdl = f.getvalue() # get formatted file to print out\n",
    "print(mdl) # It is how CGR looks like. \n",
    "# Notice that most of field are conventional MOL fields, S-queries are used for dynamic bond and atom specification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with StringIO(mdl) as f,  SDFread(f) as r: # import SDF file with CGR\n",
    "    cgr3 = next(r)\n",
    "print(cgr3)\n",
    "print(type(cgr3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.5. Pickle support\n",
    "\n",
    "CGRtools containers fully support pickle dumping and loading.\n",
    "\n",
    "Moreover backward compatability is declared since 3.0.  \n",
    "Any new version of library can load dumps created with older version.\n",
    "\n",
    "Pickle dumps are more compact than MDL files and could be used as temporal storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pickle import loads, dumps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loads(dumps(r1)) # load reaction from Pickle dump"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cgrtools",
   "language": "python",
   "name": "cgrtools"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
